PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos
Abstract
Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background.
This "noisy" motion representation makes it very challenging for pose estimation and action recognition in real scenarios.
Methodology
To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while:
• Suppressing background and motion blur
• Being robust to occlusion
For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, called PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow.
PFN is designed to efficiently learn the PoseFlow representation while maintaining computational efficiency through its functionally structured architecture.
Experimental Results
Comprehensive experiments show that:
• PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow
• PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis:
1. Pose Estimation: More accurate human pose detection by focusing on human body motion while filtering out background noise
2. Action Recognition: Better understanding of human actions through cleaner motion representations
The results validate that PoseFlow provides a more effective motion representation for understanding human behaviors in videos compared to traditional optical flow methods.